-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
icp: Port AVX2 implementation of aes-gcm from BoringSSL #17058
base: master
Are you sure you want to change the base?
Conversation
940863a
to
fbfc371
Compare
Hey @AttilaFueloep I tried to port your changes in #9749 and I can't figure out what you mean by the change with the comment |
0ce5f7b
to
1b8867f
Compare
Strong opening move! I've had a little play with it, here's what I got. Before the final thing, you should run I added this patch to let me see what it thinks is happening on my test machines: diff --git module/zcommon/simd_stat.c module/zcommon/simd_stat.c
index d82a88ca9..da557bbb0 100644
--- module/zcommon/simd_stat.c
+++ module/zcommon/simd_stat.c
@@ -117,6 +117,10 @@ simd_stat_kstat_data(char *buf, size_t size, void *data)
"pclmulqdq", zfs_pclmulqdq_available());
off += SIMD_STAT_PRINT(simd_stat_kstat_payload,
"movbe", zfs_movbe_available());
+ off += SIMD_STAT_PRINT(simd_stat_kstat_payload,
+ "vaes", zfs_vaes_available());
+ off += SIMD_STAT_PRINT(simd_stat_kstat_payload,
+ "vpclmulqdq", zfs_vpclmulqdq_available());
off += SIMD_STAT_PRINT(simd_stat_kstat_payload,
"osxsave", boot_cpu_has(X86_FEATURE_OSXSAVE)); With that, my old Intel 2019 junker laptop says:
My much nicer Ryzen 5 from last year says:
Unfortunately we don't have visibility on the ICP microbenchmarks like we do for checksums and raidz, but we can at least see the options available:
So I'd say it's all wired up right, which is half the fun. I set it to
Trying to create the pool and dataset with
And the kernel has a nice complaint:
That's all I have time for tonight. This is a good start! |
Oh the other thing I forgot to add, after loading the module it says |
ae64131
to
c0a1e8e
Compare
module/icp/algs/modes/gcm.c
Outdated
} else { | ||
/* | ||
* Handle the "cycle" implementation by creating avx and | ||
* non-avx contexts alternately. | ||
*/ | ||
gcm_ctx->gcm_use_avx = gcm_toggle_avx(); | ||
gcm_ctx->gcm_use_avx2 = gcm_toggle_avx2(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure the cycle behaviour here isn't correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed in #17061.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does this PR have a dependency on #17061?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think cycle will still work, it just won't toggle the same way (or the assumed way) - is that behaviour documented somewhere? Under what circumstances would people be using cycle?
When this code runs on an AVX2 processor, cycle will only toggle between AVX2 and generic, ignoring AVX - I don't know if that is a deal breaker enough to consider it a "dependency"
AVX only processors should still cycle the same way.
(at least that's the intention - might have to think a bit deeper about this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tonyhutter I've had another go at the cycle code - since movbe
only applies to AVX I've modified the toggling for that to not apply to AVX2. Furthermore, if bswap is needed neither accelerated method will be usable.
c0a1e8e
to
f0c81fa
Compare
This uses the AVX2 versions of the AESENC and PCLMULQDQ instructions; on Zen 3 this provides an up to 80% performance improvement. Original source: https://github.com/google/boringssl/blob/13840dd094f9e9c1b00a7368aa25e656554221f1/gen/bcm/aes-gcm-avx2-x86_64-linux.S See the original BoringSSL commit at google/boringssl@3b6e1be. Signed-off-by: Joel Low <[email protected]>
- Accept GCM H variable in network endianness (ICP convention) - Fix round count offset in AES_KEY struct (ICP convention) - Use RET macro for kernel code Signed-off-by: Joel Low <[email protected]>
f0c81fa
to
694012c
Compare
Signed-off-by: Joel Low <[email protected]>
Signed-off-by: Joel Low <[email protected]>
Motivation and Context
Zen 3 CPUs support the VAES and VPCLMULDQ instructions which extend the width of each instruction from 128-bits to 256-bits. BoringSSL has recently implemented this version for AES-GCM and it provides up to a 80% speedup. See google/boringssl@3b6e1be.
Description
I've backported the implementation from BoringSSL, adapting code from google/boringssl@3b6e1be (but picking the tip of master), as well as from google/boringssl@62f9751 which changed the primitive signature from 6 arguments to 7 (by not implicitly relying on the address offset of the ghash structure.)
Adaptations for icp (akin to #9749) as well as to use the RET macro for kernel code are in the second commit.
The third and fourth commits combine the use_avx/use_avx2 flags into an enum, allowing toggling of the different implementations that are available. Also, define different values of CAN_USE_GCM_ASM to indicate various levels of compiler support.
How Has This Been Tested?
Compile tested.
I'm now running it on my ZFS-on-root main machine.
@robn has a Wycheproof test set that he will be upstreaming soon; these changes pass.
Types of changes
Checklist:
Signed-off-by
.